Scalable and accurate knowledge discovery in real world databases

نویسنده

  • Martin Scholz
چکیده

ion: Meta-data are given at different levels of abstraction, a conceptual (abstract) and a relational (executable) level. This makes an abstract case understandable and re-usable. Data documentation: All attributes together with the database tables and views, which are input to a preprocessing chain are explicitly listed at both, the conceptual and relational part of the meta-data level. An ontology allows to organize all data, e.g. by distinguishing between concepts of the domain and relationships between these concepts. For all entities involved, there is a text field for documentation. This makes the data much more understandable, e.g. by human domain experts, than if just referring to the names of specific database objects. Furthermore, statistics and important features for data mining (e.g., presence of null values) are accessible as well. This augments the meta-data usually found in relational databases and gives a good overview of the data sets at hand. Case documentation: The chain of preprocessing operators is documented, as well. First of all, the declarative definition of an executable case in the M4 model can already be considered to provide a documentation. Furthermore, apart from the opportunity to use “speaking names” for steps and data objects, there are text fields to document all steps of a case together with their parameter settings. This helps to quickly figure out the relevance of each step and makes cases reproducible. Ease of case adaptation: In order to run a given sequence of operators on a new database, only the relational meta-data and their mapping to the conceptual meta-data has to be defined. A sales prediction case can, for instance, be applied to different kinds of shops, and a standard sequence of steps for preparing time series for a specific learner might even serve as a template that applies to very different mining contexts. The same effect eases the maintenance of cases, when the database schema changes over time. The user just needs to update the corresponding links from the conceptual to the relational level. This is especially easy when all abstract M4 entities are documented. The MININGMART project has developed a model for meta-data together with its compiler, and has implemented human-computer interfaces that allow database managers and case designers

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Data Mining & Knowledge Discovery in Databases: An AI Perspective

Data mining and Knowledge discovery has several important application areas. Data mining and knowledge discovery have been topics considered at many AI, database and statistical conferences. Knowledge discovery generally refers to the process of identifying valid, novel and understandable patterns. Knowledge discovery from large databases, often called data mining, refers to the application of ...

متن کامل

Discovery of Data Dependencies in Relational Databases Lss8 Report 14 Discovery of Data Dependencies in Relational Databases Lss8 Report 14

Knowledge discovery in databases is not only the nontrivial extraction of implicit, previously unknown and potentially useful information from databases. We argue that in contrast to machine learning, knowledge discovery in databases should be applied to real world databases. Since real world databases are known to be very large, they raise problems of the access. Therefore, real world database...

متن کامل

From Data Mining to Knowledge Discovery in Databases

databases have been attracting a significant amount of research, industry, and media attention of late. What is all the excitement about? This article provides an overview of this emerging field, clarifying how data mining and knowledge discovery in databases are related both to each other and to related fields, such as machine learning, statistics, and databases. The article mentions particula...

متن کامل

The Status of Research on Rough Sets for Knowledge Discovery in Databases

Knowledge Discovery in Databases (KDD) has evolved into an important and active area of research because of theoretical challenges and practical applications associated with the problem of discovering (or extracting) interesting and previously unknown knowledge from very large real-world databases. Many aspects of KDD have been investigated in several related fields. The emphasis of ongoing res...

متن کامل

An Intelligent Approach of Rough Set in Knowledge Discovery Databases

Knowledge Discovery in Databases (KDD) has evolved into an important and active area of research because of theoretical challenges and practical applications associated with the problem of discovering (or extracting) interesting and previously unknown knowledge from very large real-world databases. Rough Set Theory (RST) is a mathematical formalism for representing uncertainty that can be consi...

متن کامل

Darwin: A Scalable Integrated System for Data Mining

Darwin is a high-performance scalable integrated system for Data Mining and Knowledge Discovery in large databases. In this paper we present an overview of Darwin’s philosophy, architecture and functionality. We also describe the application of Darwin to selected datasets.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007